San Diego State University
San Diego State University
Abstract: Racial residential segregation is a longstanding topic of focus across the disciplines of urban social science. Classically, segregation indices are calculated based on areal groupings (e.g. counties or census tracts), with more recent research exploring ways that spatial relationships can enter the equation. Spatial segregation measures embody the notion that proximity to one’s neighbors is a better specification of residential segregation than simply who resides together inside the same arbitrarily-drawn polygon. Thus, they expand the notion of “who is nearby” to include those who are geographically close to each polygon rather than a binary inside/outside distinction. Yet spatial segregation indices often resort to crude measurements of proximity, such as the Euclidean distance between observations, given the complexity and data requirements of calculating more theoretically-appropriate measures, such as distance along the pedestrian travel network. In this paper, we examine the ramifications of such decisions. For each metropolitan region in the U.S., we compute both Euclidean and network-based spatial segregation indices. We use a novel inferential framework to examine the statistical significance of the difference between the two measures and following, we use features of the network topology (e.g. connectivity, circuity, throughput) to explain this difference using a series of regression models. We show that there is often a large difference between segregation indices when measured by these two strategies (which is frequently significant). Further, we explain which topology measures reduce the observed gap and discuss implications for urban planning and design paradigms.
An exceedingly common abstraction in applied spatial analysis is the use of Euclidean distance as a proxy measure for geographic proximity (which is, itself, often a proxy for the frequency of social interaction). It is the geographical scientist’s equivalent to the physicist’s spherical cow1, or the economist’s perfect market: a useful abstraction that helps partially explain a much more complex underlying process, however imperfectly. A major difference in spatial analysis, however, is that scientists from many disciplines often fail to realize how simplified the assumption of Euclidean distance is when traversing the built or natural environment. While, in general, simple proximity is a reasonable heuristic for understanding Tobler’s Law (Tobler, 1970), the behavioral realities of movement and social interaction in complex urban environments often require a more thoughtful model.
More directly, cities, regions, and neighborhoods are not featureless planes in which agents have perfect freedom of mobility. Rather, they are multifaceted environments populated by highways, canyons, rivers, mountains, railroad tracks, alleyways, and power plants. To facilitate movement in this environment, an interleaved transportation system provides passageways through discrete locations, and conditions how easy it is to move throughout the region and interact with individuals in other parts of the region. Although pure Euclidean distance can proxy this system, the urban design decisions that govern how and where networks are located, as well as the natural features like elevation or water features play an important, albeit underexamined, role in mediating social interactions.
One particular topic where a full understanding of space would provide significant benefits is segregation analysis, a longstanding topic of focus across the disciplines of urban social science. Classically, segregation indices are calculated based on areal groupings (e.g. counties or census tracts), with more recent research exploring ways that spatial relationships can enter the equation. Spatial segregation measures embody the notion that proximity to one’s neighbors is a better specification of residential segregation than simply who resides together inside the same arbitrarily-drawn polygon. Thus, they expand the notion of “who is nearby” to include those who are geographically close to each polygon rather than a binary inside/outside distinction. Yet spatial segregation measures often resort to crude measurements of proximity, such as the Euclidean distance between observations, given the complexity and data requirements of calculating more theoretically-appropriate measures, such as distance along the pedestrian travel network.
In this paper, we examine the relationship between pedestrian network characteristics and the measurement of metropolitan segregation. In doing so, we examine three research questions in turn: first, how much does the operationalization of space matter for segregation measurement? More specifically, how large is the difference between Euclidean-based and network-based measures of spatial segregation? Second, if differences exist between Euclidean and network measures, are they large enough that they cannot be attributed to chance? Third, what characteristics of the travel network are related to the observed difference in measurement? If there is a large and/or systematic difference between traditional spatial measurements and those leveraging more realistic measurements of distance, then there may be much to learn about the contribution of network structure and design when seeking to maximize urban integration.
Since the inception of city planning, the relationship between social interactions and the built environment has been a topic of intense focus for both social scientists and urban designers (Talen, 2017). The normative concepts of urban utopias prescribed by architects like Ebeneezer Howard, Frank Lloyd Wright, and Le Corbusier included distinct visions for how densely populated and separated/integrated land uses could facilitate the ideal level of interaction between a resident and (a) her neighbors, and (b) her natural surroundings (Campbell & Fainstein, 1996; Corbusier, 1986; Howard & Osborn, 2001). Combining these visions with ideas from Wirth (1938) and the famous ‘neighborhood unit plan’ articulated by Perry (1929), large scale developers like James Rouse developed concepts for new towns like Columbia, Maryland that were based largely on the design of insular street networks (Olsen, 2003).
At their best, these designs were intended to foster community for the residents that live within them, and ensure that amenities like school, shopping, employment, and leisure are all within a walkable distance from the neighborhood’s core. From a more cynical perspective, the cul-de-sac patterns and interspersed greenways of the ‘neighborhood unit plan’ helped codify the American ideal of white flight and the picturesque upper-middle class neighborhood, using both urban design and land-use policy as informal mechanisms of residential sorting. Thus, although the arrangement of people in space has been a focus of urban thought for more than a century, it remains an open question how well features of the real urban fabric are represented in quantitative models of social interaction, such as segregation indices–and whether urban design characteristics shape our perception of these patterns.
Now we have both the tools and the logic to test these assumptions and understand the role of abstractions such as Euclidean distance-based measures in our assessment of critical social processes such as residential segregation. Fast graph algorithms allow us to construct more realistic concepts of spatial weights matrices, and computational statistics allow us to construct and test realistic null hypotheses about the allocation of urban population groups. Here, we examine the role of street network topology in the appropriate measurement of urban segregation. Our goals are twofold.
First, we aim to understand the implications of simple Euclidean distance- based abstractions when conducting formal spatial analyses; that is, do we find substantive differences in results when more realistic concepts of spatial relationships (e.g. network connectivity) are considered? Second, we aim to explore the elements of urban design (particularly the street network configuration) in widening the gap between analytical abstraction and empirical reality. More simply, we aim to understand whether certain elements of the street network are associated with a greater difference in measured segregation. With this knowledge, urban designers and planners can begin with more inclusive communities from the beginning.
In a foundational contribution, White (1983) conceives of segregation in terms of spatial interaction, and formulates a spatial dissimilarity index using an exponential decay function to weight the proximity between observed census units. Despite the importance of the contribution, the application of White’s technique has never become widespread, perhaps in part because of the difficulty in operationalizing the index prior to modern GIS. Through the 1990s a surge of research on spatial segregation indices examined different methods for incorporating space, leveraging the growing GIS capacity of the era. An important critique of the time is given by Wong (1993) who shows that spatial segregation indices based on contiguity between adjacent units provide poor definitions of the local neighborhood. This criticism is based in part because geographic units are heterogenously-sized and also because polygon adjacency may be a poor measurement of “nearness”. Additional work has explored the sensitivitiy of segregation measures to the modifiable areal unit problem (MAUP) (Openshaw, 1984), and by extension, the importance of spatial scale (Wong, 1997; Wong, 2004). Some authors have also developed spatial extensions or decompositions of popular indices such as the Gini index (Dawkins, 2004; Rey & Folch, 2011)
In a canonical contribution to the segregation literature, Reardon & O’Sullivan (2004) develop a generalized framework for creating spatial segregation indices using a generic formulation of the neighborhood. They also show that the spatial information theory index \tilde{H} and the spatial isolation/exposure index \tilde{P}^\ast have the most desirable conceptual and mathematical properties. O’Sullivan & Wong (2007) provide an operationalization of this approach using kernel density estimation to operationalize the notion of the neighborhood in continuous space, overcoming many of the traditional criticisms of spatial segregation measures. In doing so, they provided an important path forward for a body of work that has continued to expand the notion of space.
A variety of authors have also begun to examine the role of spatial scale. In an important advance in segregation methods, Reardon et al. (2008) develop a method for understanding the implications of multiscalar segregation by varying the distance parameter used to compute the local environment in a spatial segregation index. Following, Reardon et al. (2009) and Lee et al. (2008) apply the framework to a large set of metropolitan regions in the U.S., demonstrating a wide variety of macro versus micro-scaled patterns, and other work has explored the role of multiscalar change over time (Bailey, 2012; Fowler, 2016). Another prominent body of work builds on this work, exploring the notion of “egohoods,” where each household has its own concept of the neighborhood that extends outward and partially overlaps with others nearby (Hipp & Boessen, 2013; Petrović et al., 2018, 2019). Even more recently, additional measurement techniques have been developed that help summarize multiscalar patterns using a single index (as opposed to an array or a ratio) (Bézenac et al., 2022; Clark et al., 2015; Olteanu et al., 2019; Östh et al., 2015). This research has provided clear evidence not only of the importance of considering spatial relationships in segregation measurement, but also the ways that misspecification of space (such as application of an inappropriate scale) can lead to a skewed concept of the phenomenon under study.
Elsewhere, scholars have examined the role of physical barriers and built features of the urban environment in facilitating social contact. For example Grannis (2005) shows social interactions are more frequent inside “T-communities” defined by street networks (Grannis, 2005), and Roberto (2018) uses street networks to measure segregation in a small-scale case study, and shows that segregation in Pittsburgh is higher when measured according to network distance. These contributions emphasize a long-recognized but understudied element of metropolitan segregation patterns, namely that transport networks, physical barriers, and other factors such as elevation or congestion condition the expected potential for social interaction in space. For example work in sociology has shown the importance of street network connectivity in fostering social networks inside small urban geographic zones (Grannis, 1998). The natural logic underlying these findings is that street networks can help insulate urban environments and provide greater exposure to residents living inside “the neighborhood” than those who live outside, but this distinction can be masked easily when measuring metropolitan space using Euclidean distances.
Figure 1: Network Distance vs Euclidean Distance in Urban Environments. a — Distance Comparison in San Clemente, b — Distance Comparison in Chicago, c — Distance Comparison in Flint, d — Distance Comparison in Portland, e — Distance Comparison in Miami, f — Distance Comparison in St Louis
A depiction of the difference between network travel distance and “as the crow flies” distance is shown in Figure 1. The figure shows an origin marked with an X in the center, and two different polygons representing a one-mile travel distance using different methods in the cities of San Clemente and Chicago. The small polygon depicts the total extent accessible from the origin point when traveling along the pedestrian network, whereas the larger polygon depicts the 1-mile buffer representing unconstrained travel. It is immediately apparent in the figure that network-constrained travel covers a much smaller footprint than Euclidean distance in the depicted location. Furthermore, the pattern appears to be influenced strongly by the street network and urban design features that characterize the largely suburban region of San Clemente.
Instead of a regular grid that facilitates travel in all directions (like the densely urbanized section of Chicago in Figure 1 (b)), the street network in Figure 1 (a) includes several insular patterns, cul-de-sacs, and 3-way intersections that help channel traffic in certain directions rather than others. Furthermore, the fact that some subdivisions have only a single entrance makes clear how much further a person would need to travel to reach the homes in certain regions (versus how much easier they appear to be reached via the circular buffer). By contrast, the regular gridded pattern in Chicago in Figure 1 (b) allows travel to flow in all directions. Because the origin starts on a street oriented East-West, the polygon covers essentially the entire circular buffer in that direction. The North-South direction is limited, however for two reasons, first, the traveler needs to reach a cross street before changing direction, and second the Kennedy expressway provides a man-made physical barrier that impedes travel in the southwestern direction, creating a hard edge in the inner polygon except along a single passageway. A similar phenomenon impedes traffic in the northward direction, as the network does not extend into Saint Luke Cemetery.
Using evidence from a case study in Pittsburgh, Roberto (2018, p. 28) argues that, “even small positive differences in the city-level results are meaningful and suggest that physical barriers facilitate greater separation between ethnoracial groups and higher levels of segregation.” We agree with the spirit of this assessment, however, we would extend and clarify that physical barriers themselves do not necessarily create greater separation between groups–although action by other parts of the urban system such as inequitable land use planning or racial steering by lenders or agents can (and does) interact with these barriers to create segregated real estate markets and phenomena such as one group living on the “other side of the tracks” (Roberto, 2018).
Further, as Figure 1 shows, it is not simply the presence of physical barriers, but also the geometric design and topological structure of the travel network that facilitates separation between people in urban space. The curvilinear, meandering streets, and abundance of cul-de-sacs in San Clemente stand in sharp contrast to the dense, regular grid in Chicago, even though the network in Chicago also includes additional barriers like highways. In what follows, we examine the magnitude of differences between network and simple Euclidean measures in detail for every metropolitan region in the United States. Specifically, we expand upon prior work in three different directions. First, we widen the geographic scope by considering every metropolitan region in the United States, rather than a case study of a single city. Second, we adopt a computational inference framework that allows us to assess whether the observed differences between the segregation measures are large enough that they could not happen by chance. Finally, we explore the relationship between differences in observed segregation and characteristics of the local travel network.
We begin our analysis by computing two sets of segregation indices, adopting the spatial information theory index \tilde{H} as our measure of segregation. As Reardon et al. (2008, p. 512) describe, “the index \tilde{H} is a measure of how much less diverse individuals’ local environments are, on average, than is the total population of region”, and reaches its maximum of 1 only when “each individual’s local environment is monoracial”. Here, our goal is to test how sensitive the statistic is to different concepts of the “local environment,” with one concept adopting the simplified assumption of Euclidean-based distance measurements, and the other requiring that distance be measured along a pedestrian transport network.
Following Reardon & O’Sullivan (2004) we consider a spatial region populated by M racial groups indexed by m, with \tau and \pi as population density and proportion, respectively. Here we diverge from the classical notation in the segregation literature and instead adopt conventions more common in spatial econometrics and geographic analysis.2 Doing so allows us to strengthen the connection between similar concepts in different disciplines as well as gain finer control over the definition of spatial relationships. Since many spatial segregation measures are implemented in GIS and spatial analysis software designed by geographers, clarifying this connection can help ease interdisciplinary adoption and conversation around spatial segregation measures.
Thus, we index locations as i and j, and we operationalize the concept of spatial relationships using a spatial weights matrix W (Cliff & Ord, 1970). By focusing on W, we are forced “to specify [our] underlying assumptions about socio-spatial proximity”, following the call by Reardon & O’Sullivan (2004, p. 154) for analysis that “compares segregation levels based on different theoretical bases for defining spatial proximity.” Conceptually, the spatial weights matrix W reflects the connectivity graph for the spatial relationship between nodes i and j, and the values w_{ij} encode the intensity of the association \bar{ij}. The spatial weights matrix is a useful and flexible representation of the local neighborhood environment because it provides a generic data structure for encoding spatial relationships, where any link function (\phi, following the notation of Reardon & O’Sullivan (2004)) can be used to specify the proximity between units. Formally,
W = \phi(D)
\qquad{(1)}
where \phi is a proximity weighting
function and D is a matrix containing
pairwise distances for all i and j. Classically, W is typically created via binary
connectivity between adjacent units, but a wide variety of other
continuous specifications are also used in practice (Getis,
2009; Halleck Vega & Elhorst, 2015; Rey & Anselin, 2010),
such as the Euclidean distance between observations, or various kernel
or distance-decay functions. Critically, the distance-weighting function
\phi is distinct from the concept of
distance (D), itself, which
could be measured in Euclidean/geodesic distance, minutes of congested
travel time, meters traveled along the sidewalk, or some generalized
measure of utility. Separating these two concepts allows us to consider
alternative distance metrics distinctly from alternative decay
functions. The local environment for a given feature y at location i can then be measured by its spatial
lag, SL, defined as
SL_i = \sum_j w_{ij} y_j \ . \qquad{(2)}
In the spatial econometrics literature, it is common to exclude the diagonal elements from W to differentiate between focal effects and spatial spillovers in regression models, but when the diagonal is filled, then SL_i becomes a consummate measure of the local environment at location i. To compute the spatial multigroup information theory index \tilde{H}, we first calculate local spatially-weighted population proportions as
\tilde{\pi}_{im} = \frac{SL_{im}}{\sum^M_{m=1}{SL_{im}}} \ .
\qquad{(3)}
The density at location i is
\tilde{\tau_i} =
\frac{\sum^M_{m=1}{SL_{im}}}{\sum^M_{m=1}\sum^I_{i=1}{SL_{im}}} \ .
\qquad{(4)}
The entropy of the local environment at each location \tilde{E}_i is
\tilde{E}_i = -\sum^M_{m=1}(\tilde{\pi}_{im})\log_M(\tilde{\pi}_{im}) \
.
\qquad{(5)}
where M indicates the number of groups
in the population. Finally,
\tilde{H} = 1-\frac{1}{TE} \sum^I \tilde{\tau_i}\tilde{E}_i
\qquad{(6)}
where \tilde{H} is the spatial
information theory index defined by Reardon & O’Sullivan
(2004), T is the total
population of the region, and E is the
entropy of the region’s total population
E = -\sum^M_{m=1}(\pi_m)log_M(\pi_m). \qquad{(7)}
We perform all calculations using the open-source Python package
segregation (Cortes et al., 2020),
distributed as part of the Python Spatial Analysis Library (PySAL) (Rey, Anselin, et
al., 2021)
To understand the implications of different parameterizations of space, we use block group-level data from the US Census American Community Survey (ACS) 5-year sample (2013-2017) with four mutually-exclusive racial groups (non-Hispanic white, non-Hispanic Black, Hispanic, and Asian). Our sample contains data for 380 metropolitan Core Based Statistical Areas (CBSAs) in the United States. Block Groups are the smallest geographic unit for which racial and ethnic data are available in the ACS. To compute Euclidean-based spatial segregation measures, our distances are measured between block group centroids; to compute network-based spatial segregation measures, we first attach the block group centroids to the nearest intersection in the travel network, then compute the shortest network-based path between each pair of observations
Our data on street networks is collected from OpenStreetMap and the
shortest network path is computed using the Python package
pandana (Foti et al.,
2012). To operate efficiently on metropolitan-scale street
networks, the pandana package relies on a graph pre-processing technique
known as contraction hierarchies that simplifies the computation by
removing inconsequential nodes from consideration during the routing
algorithm (Geisberger et al.,
2012). Adopting this heuristic provides a massive computational
boost, allowing the shortest-path algorithm to perform quickly, even
with metropolitan-scale networks. This technique allows us to examine
all metropolitan CBSAs in the country, comprising an analysis that
includes tens of millions of street intersections.
In each metropolitan region, we proceed by creating two different spatial weights matrices by varying the way distance is measured between observations. In both matrices, the proximity-weighting function \phi is a simple linear decay (triangular kernel) encoding a spatial weight that decreases with distance up to a threshold of two kilometers, outside of which observations no longer have an effect, (that is, r=2000):
\phi=
\begin{cases}
1- \left( \frac{d_{ij}}{r} \right),& \text{if } d_{ij}\leq r \\
0 & \text{otherwise.}
\end{cases}
\qquad{(8)}
Between the two W matrices, however, we vary the input distance matrix D, between two concepts, Euclidean distance (W_{euc}) and network distance (W_{net}), where network distance is defined as the shortest path along the pedestrian transportation network. In both matrices the diagonal is set to one, indicating that there is no spatial discount for the value located at observation i. Using these weights matrices W_{net} and W_{euc} to build local environments for each metropolitan region in Equation 1 propagates the two constructs through Equations 2-6, yielding two segregation measures \tilde{H}_{net}, \tilde{H}_{euc} and, implicitly, a difference between the two, \Delta_{\tilde{H}} = \tilde{H}_{net} - \tilde{H}_{euc}. The relative difference between segregation measures is the difference divided by the Euclidean measure: \Delta_{pct} = \frac{\Delta_{\tilde{H}}}{\tilde{H}_{euc}}.
We assess the importance of considering network distance in segregation measurement by adopting the inferential framework outlined in Rey, Cortes, et al. (2021) and Cortes et al. (2020). The framework leverages a computational approach to statistical inference using random labelling to compare the observed difference between the two segregation measures (network versus Euclidean) to a counterfactual distribution of differences generated from the same data. More specifically, the measures \tilde{H}_{net}, \tilde{H}_{euc} and \Delta_{\tilde{H}} are computed and recorded for each metro region. As a result of this process, two “spatialized” versions of the metropolitan demographic composition are created, with one dataset representing Euclidean distances and the other representing network-based distances.
We then create two synthetic datasets by pooling the input units from both original datasets and reassigning them at random. For each block-group, we randomly reassign the labels (net,euc) to the observed spatial lags from Equation 2. Once all units have been assigned to a group, the segregation measures are re-computed and their difference taken. This process is repeated 10,000 iterations. By comparing the observed difference in the two segregation measures against a distribution of differences generated via synthetic datasets, we are able to develop inferential statistics using a conventional t-test. Our test, in this case, adopts the null hypothesis that distances come from a common distribution and thus the expected difference in the segregation measures is 0. The p values represent probability that, under the null, a simulated difference is greater than than the observed difference \Delta_{\tilde{H}}.
Although the Pearson correlation between planar and network based segregation measures is \rho=0.987, our results provide clear evidence that the choice of appropriate distance metric plays an important role in the computation of a spatial segregation index. We highlight this result using the fact that applied segregation research often uses ordinal rankings to describe and compare the magnitude of segregation across a set of places. While still high, the rank correlation between the two measures is considerably lower at \tau=0.90. Substantively, this means that an analysis of segregated metropolitan regions will result in different conclusions regarding the “most segregated” places, depending on which distance measure is employed. A visual comparison of the top 15 most segregated metros is provided in Figure 2, demonstrating how different places exchange ranks, and Figure ¿fig:scatter? in the supplementary material portrays the relationship between segregation measured using the two different distance metrics for the sample CBSAs.